An efficient class of estimators for finite population mean in the presence of non-response under ranked set sampling (RSS)

In this study, we address the problem of estimating the finite population mean when the non-response occurs on the characteristics under study. We propose a class of Rao-regression type estimators when ranked set sampling (RSS) procedure is used to collect the data from non-response group only and from both, the response and non-response groups. The information provided on the auxiliary variable is used at both stages i.e., at designing stage and the estimation stage. Expressions for bias and mean square error of the estimators are obtained up to first order of approximation. A comprehensive simulation study is carried out to observe the performances of the estimators under non-response.


Unfunded studies
Enter: The author(s) received no specific funding for this work. This material is the authors' own original work, which has not been previously published elsewhere.
2) The paper is not currently being considered for publication elsewhere.
3) The paper reflects the authors' own research and analysis in a truthful and complete manner.  demands that an appropriate sampling method should be employed to conduct a sample 10 which represent the population well enough. In such cases, ranked set sampling (RSS) is 11 an alternative and more efficient technique to collect a sample since it is determined by 12 ranking a greater number of sampling units based on their relative sizes, then picking a 13 smaller number of units from each ranked group under observation. As a result, RSS 14 increases the precision of estimates by reducing the sampling errors. The availability of 15 some appropriate auxiliary variables that are correlated with the variable under study is 16 also a factor in improving the estimation. RSS sampling procedure utilizes the auxiliary 17 variable information to collect a data by using its order at the data collection stage.

18
Theory of RSS was first introduced by McIntyre (1952). In this method, units of the 19 study variable are ordered by either visual judgment or by some cheap quantitative mea-20 sures. Takahasi & Wakimoto (1968) developed an unbiased estimator of the population 21 mean under RSS technique. Lynne Stokes (1977) exposed another important use of the 22 auxiliary variable, and it is that we can use the ranks of to order the units of .  (2021) can also be seen for the use of this feature of auxiliary variable. The use 26 of an auxiliary variable can also be used in reducing sample variation and increasing 27 efficiency of estimates at the estimation stage. Several authors have claimed that the 28 use of auxiliary variables at data collection stage can result into more efficient estimates. 29 It's worth noting that adding more auxiliary variables at estimation stage improves the 30 efficiency of estimates.

31
Bouza-Herrera (2013) has considered the comprehensive layout of the non-response 32 under RSS. In his book, the problem of non-response when the sample is collected 33 through RSS on the second attempt from a non-response group can be seen. Inspired 34 by Bouza-Herrera (2013)  The non-response is thought as one of the most important problems in theory of survey 39 and data analysis. The occurrence of non-response in a data lead to skewed estimates 40 and less representative sample of the population. The discrepancy between respondents 41 and non-respondents on a given measure combined with the non-response rate in the 42 population produces non-response bias. A lower response rate raises the risk of larger 43 non-response bias, but when data is missing at random (MAR), a lower response rate 44 has no effect on the non-response error. In practice, using information from an auxiliary 45 variable is often costly, but classifying things based on it is quite straightforward. We 46 assume that RSS sampling technique may improve the precision in estimating population 47 mean by using the information of auxiliary variable at estimation stage.

49
We consider the naive model of Hansen & Hurwitz (1946). According to which, we draw 50 a sample S of size n s by using SRSWOR method from a finite population Ω of size N . 51 Let n 1 units respond to the survey at first attempt while n 2 (= n − n 1 ) units do not 52 respond. Special efforts are made to approach the non-responding units and a part of 53 them (n ′ 2 = n 2 /k; k > 1) is included in the sample. Thus, we get a final sample of size 54 n = n 1 + n ′ 2 for estimation purpose. This allows the entire population is to be separated 55 August 15, 2022 2/13 into two complementary groups called response and non-response group.

56
Let (Y ji , X ji ) Nj i=1 ; j = 1, 2 shows population units of study variable (Y ) and auxiliary 57 variable (X) in the two groups. our goal is to estimate the finite population mean Ȳ of 58 the study variable. Hansen & Hurwitz (1946) suggested the following unbiased estimator 59 when non-response occurs on Y .
The variance ofȳ * srs is given by, Similar results can be obtained for an auxiliary variable X when it is involved in the estimation of population mean for the study variable Y with the following covariance, When the population mean of the auxiliary variable is known and incomplete information 67 exist only on the study variable, then the Rao (1986) and Rao (1987) suggested the 68 following ratio and regression estimators for estimatingȲ as, and 70 y reg =ȳ * +β yx X −x (4) Cochran (1977) and Khare & Srivastava (1997) defined the following ratio and regression 71 estimators when non-response occurs on both variables, and x are estimates of population regression coefficient Riaz et al. (2014) proposed the following Rao-regression type estimators in the way of 75 Rao (1991).
where d 1 and d 2 are scalars, whose values are either pre-determined or calculated wisely to minimize the MSE of the estimator. The minimum MSE of t s1 with the optimum values of d 1 and d 2 is given by, Riaz et al. (2014) also proposed the following generalized class of Rao-regression type 78 estimators by using the idea of Diana et al. (2011), where d 3 and d 4 are constants and h is generic function ofū =X −x * satisfying some mild conditions. The optimum values of d 3 and d 4 along with the minimum MSE are given by, For more detail see Diana et al. (2011) and Riaz et al. (2014).

88
Ranked set sampling (RSS) 89 The RSS procedure is described as in the following steps:

91
Step 2: Array each set inside in ascending order by mean of the study variable or any 92 closely related auxiliary variable. The ranking is done either by visual inspection or 93 some quantitative measurements.

94
Step 3: Select the lowest order unit from the first set.

95
Step 4: Select the second lowest order unit from the second set and continue selecting 96 units in this way until ν th order statistic is selected from ν th set.

100
August 15, 2022 4/13 Bouza-Herrera (2013) has done a very comprehensive work on dealing with the missing 102 observations under ranked set sampling. Here we discuss the following two major 103 situations of dealing non-response under ranked set sampling.

104
Situation-I: When RSS is used at second attempt only.

105
Let we collect data at second attempt by method of RSS such a way that ν independent 106 sets each of size ν are selected from the non-response group. The later procedure is 107 followed step by step as discussed in Section 3 earlier. The estimate of sample mean 108 from non-responding units based on n ′ 2 = r ′ 2 ν sampled units under RSS is given by, If the population is symmetrically distributed thenȳ ′ 2rss is an unbiased estimator of the 110 population mean with the following expected variance, where 112 ∆ ′ 2y(i) = µ 2y(i) − µ 2y 113 Thus Eq (1) becomes, y * rss is an unbiased estimator for which the variance is given by, similar result can be obtained for the auxiliary variable as, Situation-II: When RSS is used at both attempts (RSS on both groups).

117
It is extension of Situation-I in which we collect a sample from both groups by using 118 RSS. The estimator of sample mean can be written as, whereȳ 1rss is mean of sample based on n 1 units collected at first attempt, whileȳ ′ 2rss 120 is sample mean calculated from n ′ 2 units collected at second attempt.ȳ * * rss is also an 121 unbiased estimator which has the variance given as, similar result can be obtained for the auxiliary variable as, Note that the set size ν is kept constant while other notations are used as, 124 n 1 = r 1 ν, n 2 = r 2 ν, n ′ 2 = r ′ 2 ν, r = r 1 + r ′ 2 , n = rν, k = n2 n ′ 2 = r2 r ′ 2

125
August 15, 2022 5/13 Generalized class of Rao-regression type estimators 126 under RSS in the presence of non-response 127 We extend the work of Rao (1991) and Diana et al. (2011) for the estimation of finite 128 population mean when non-response occurs in surveys and RSS is used for the collection 129 of data instead of SRS.

130
Our first suggested estimator is, where q 1 and q 2 are constants, whose optimum values are used to minimize the error of 132 estimate.

133
Equation (20) can be written as; The error terms are defined asv * rss =ȳ * rss −Ȳ andū * rss =X −x * rss , then it is easy to calculate that; if RSS is used at second attempt only The bias and MSE of t r 1, is given by, The optimum values of q 1 and q 2 with minimum MSE are given as it should be noted that the value of q 1opt is mathematically always a positive quantity, 136 while the nature of sign for q 2opt is depending upon the correlation coefficient between 137 Y and X.

138
Similarly, our second suggested general class of estimators is given as, where q 3 and q 4 are pre-determined constants and h is generic function ofū * rss =X −x * rss 140 that is satisfying the following mild conditions,

141
Function h is bounded in the vicinity of zeros and is continous.

154
The proposed class of estimators is determined by the function which can theoretically 155 and practically take into account a variety of options but in our study, we discuss the 156 following two well-known functions. * 3. If we consider τ = −1, then {a, b, c} = {1, 1, 0}. Hence the proposed estimator 170 is converted to the product type estimator which is used when the correlation 171 coefficient between study variable and auxiliary variable is negative i.e., Then proposed class of estimator takes the following form, By expanding h (ū * rss ) and using Taylor's series, we get, By using the values of constants a, b and c, we can easily calculate the corresponding 179 optimum values of q 3 and q 4 with minimum MSE of estimators by using Eq (26), for the 180 two choices of function h discussed above .

Efficiency comparison 182
The gain in precision of the proposed class of estimators is completely relying on the 183 precision gained in the estimation by using RSS instead of SRS sampling method. So, 184 it is reasonable to compare the variances of sample mean by using the two competing 185 sampling strategies. Consider the variance ofȳ srs In the above expression the term σ 2 y n is same as the variance ofȳ srs , which indicates that 187 the RSS procedure will produce more precise estimates than the SRS if the inequality 188 µ y(i) ̸ = µ y holds. This also leads to a certainty that if we use the RSS procedure for 189 collecting data at both attempts, then the proposed class of estimators will be more 190 precise than the estimators discussed under situation-I where we collect data through 191 RSS only at second attempt.

192
Simulation study 193 The simulation study is carried out in the context of assumptions made by Hansen 194 & Hurwitz (1946). We generated the auxiliary variable for the two groups as X j ∼ 195 N ormal (N j , µ yj , σ yj ); j = 1, 2. The corresponding study variable is produced by using 196 the relationship Y j = ρ yx X j + e j 1 − ρ 2 yx , where e j ∼ N ormal (N j , 0, 1) and ρ yx is the 197 coefficient of correlation between Y and X. A sample of size n = (n 1 + n ′ 2 ) is selected 198 from population by using the procedure of SRS and RSS and sample mean are calculated 199 for the study and auxiliary variable under Situation-I and Situation-II. Then the sample 200 mean for competing and proposed estimators is estimated. This procedure is repeated 201 20,000 times to calculate the MSE and RE of the estimators using the following formula, 202 M SE(t) = 1 20, 000 20,000 where t represents an estimator under consideration. The results are given in tables 1-6. 204 Table 1. RE of proposed class of estimators for Situation-I when K = 2.    Table 6. RE of proposed class of estimators for Situation-II when K = 4 (r 1 , r ′ 2 ) Est ρ = 0.10 ρ = 0.50 ρ = 0.90 ν = 3 ν = 4 ν = 5 ν = 3 ν = 4 ν = 5 ν = 3 ν = 4 ν = 5 (3, 1)  Table 1-6 shows that the relative efficiency of proposed estimators increases as the value of correlation coefficient increase. the relative efficiency also increases with the increase in overall sample size n, while relative efficiency is observed highest when moderate set size ν = 4 is used. However, the relative efficiency decreases when the value of non-response rate K is increased.

205
From the simulation results it is observed that the proposed class of estimators is provides 206 more precise results for estimating population mean when non-response occurs in surveys. 207 Furthermore, relative efficiency of proposed class of estimators is higher when RSS is 208 used to collect data at both attempts than using RSS for data collection only on second 209 attempt. So, it is suggested to use the proposed class of estimators under RSS sampling 210 technique. The use of RSS on both attempts of data collection is recommended for the 211 more gain in efficiency of estimates.

212
Acknowledgments 213 I would like to express my gratitude to my supervisor, Javid shabbir, who guided me 214 throughout this paper.